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A measurement system is capable if it pro- 
duces measurements with uncertainties 
small enougli for demonstration of compli- 
ance with product specifications. To es- 
tablish the capability of a system for Rock- 
well C scale hardness, one must assess 
measurement uncertainty and, when hard- 
ness is only an indicator, quantify the re- 
lation between hardness and the product 
property of real interest. The uncertainty 
involves several components, which we des- 
ignate as lack of repeatability, lack of re- 
producibility, machine error, and indenter 
error. Component-by-component assess- 
ment leads to understanding of mechanisms 
and thus to guidance on system upgrades 
if these are necessary. Assessment of some 
components calls only for good-quality 
test blocks, and assessment of others re- 
quires test blocks that NIST issues as 



Standard Reference Materials (SRMs). The 
important innovation introduced in this 
paper is improved handling of the hardness 
variation across test-block surfaces. In 
addition to hardness itself, the methods in 
this paper might be applicable to other 
local measurement of a surface. 
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1. Introduction 



Systems for hardness measurement do not follow ex- 
actly a protocol that one might choose. For this reason, 
there is a difference between the measurement one ob- 
tains and the measurement one wishes to obtain. To 
characterize this difference, one can perform test mea- 
surements and from these compute the measurement 
uncertainty. This paper shows how to do this. 

Rockwell C scale measurement is an indentation test 
[1,2]. A protocol for performing this test consists of 
prescriptions for choosing a properly shaped indenter, 
for driving the indenter into the material, and for calcu- 
lating the hardness value from the indentation depths. 
The indenter prescribed is a diamond cone with 120° 



cone angle blended in a tangential manner with a spher- 
ical tip of 200 jjtm radius. Driving the indenter into the 
material involves two levels of force, 98.07 N and 
1471 N. (These forces can be achieved with masses of 
10 kg and 150 kg.) First, as shown in Fig. 1, the smaller 
force is applied for a prescribed time interval and then 
the first indentation depth is measured. Second, the force 
is increased to the higher level in a way that drives the 
indenter into the material at a prescribed velocity. Third, 
the larger force is held for a prescribed interval. Fourth, 
the force is reduced back to the lower level and held for 
a prescribed interval after which the second indenta- 
tion depth is measured. Not everyone who chooses a 
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Fig. 1. Rockwell C scale hardness test illustrated with change of force with 
time (a) and resulting indentation depth (b). Indenter depth measurements 
used in calculation of hardness value are indicated by X symbol. 



Rockwell C scale protocol chooses the same prescribed 
intervals and velocity. The American Society for Testing 
and Materials (ASTM) [1] and the International Organi- 
zation for Standardization (ISO) [2] only limit the 
choices allowed. The hardness is calculated from the 
difference between the second depth and the first. This 
difference is expressed in multiples of 0.002 mm and 
subtracted from 100. For steels, Rockwell C scale hard- 
ness values are typically between 25 HRC and 65 HRC. 
In manufacturing, one generally indexes measure- 
ment capability by contrasting measurement uncer- 
tainty with the tolerance that the product is required to 
meet [3]. The tolerance is the difference between the 
upper and lower specification limits for the product 
property. In the case of hardness, tolerances are an issue 
because the bases on which they are set are often ob- 
scure. What is usually not made explicit is that hardness 
is an indicator of a product property critical to quality 
and that a tolerance on the critical property is the proper 
basis for a hardness tolerance. 



Two features distinguish the approach to uncertainty 
assessment presented here. The first is reliance entirely 
on test blocks, and the second is specification of block 
locations for test measurements. Although we do not 
deal with it here, one can obtain insight into hardness 
uncertainty by comparing what one considers to be the 
ideal protocol with the actual realization of this proto- 
col. Such an approach can be based on propagation of 
uncertainties [4]. For some deviations of the realization 
from the ideal, such as a deviation in applied forces, in 
indenter geometry, or in time intervals of force applica- 
tion, one can (1) characterize the deviation as an uncer- 
tainty, (2) obtain sensitivity coefficients that relate such 
deviations to the resulting hardness measurements, and 
(3) assess the corresponding component of the measure- 
ment uncertainty. The coefficients in the second step 
cannot be obtained mathematically as in the usual prop- 
agation of uncertainties but must be obtained experi- 
mentally. Rather than taking this three-step approach, 
we assess the contribution to the uncertainty of such 
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deviations through use of the NIST Rockwell C scale 
SRMs [5, 6]. Another possibility that we do not deal 
with here is arbitrary choice of measurement locations 
on test blocks. Commonly, guidelines recommend use of 
the average of five measurements taken somewhere on 
the test block (Annex H in [4]). The computations asso- 
ciated with this are based on the assumption that the 
locations were chosen randomly. Rather than consider- 
ing this possibility, we specify measurement locations 
and proceed accordingly. 

The usefulness of measurement uncertainty for deci- 
sion making increases when the relation between com- 
ponents of uncertainty and the physical mechanisms that 
affect the measurement system are clarified. In its orga- 
nization, this paper proceeds component by component 
giving for each component the needed test measure- 
ments and analysis. Although we do not define a compo- 
nent for each possible mechanism, the components we 
define can be related to groups of possible mechanisms 
and further distinctions are sometimes possible. Think- 
ing in terms of mechanisms is especially useful in de- 
ciding how one's hardness equipment might be up- 
graded. Such thinking is also important when one is 
trying to understand why the results of two hardness 
measurements differ. Not all mechanisms and thus not 
all uncertainty components necessarily contribute to the 
explanation of the difference between two measure- 
ments. Thinking in terms of mechanisms allows one to 
decide which components can properly be used to ex- 
plain an observed inconsistency. 

The mention of equipment upgrades raises the ques- 
tion of whether the expense is necessary. This involves 
not only the measurement uncertainty but also the 
product requirements that the measurement system is to 
ensure. This issue is the subject of the next section. 

The remaining sections explain how to assess experi- 
mentally each uncertainty component. The experimen- 
tal work makes use of test blocks, which introduce an- 
other source of variation, the nonuniformity of the 
hardness across test block surfaces. Dealing with this 
source of variation as one assesses the hardness uncer- 
tainty components is the subject of Sees. 3-5. 

As usually applied, Rockwell C scale hardness is a 
test method that indicates properties of an entire unit 
through measurement of only a small portion of the unit. 
Development of test methods with this characteristic is 
important beyond hardness applications. For example, 
under the name "combinatorial methods," there is con- 
siderable interest in experiments involving runs under 
many different conditions but with only a small quantity 
of material for each condition, a quantity so small that 
only an indicator of the desired product property can be 
measured. Because such experiments are usually per- 
formed with material samples laid out on a surface, the 



uncertainty analysis methods in this paper may be in- 
structive. 



2. Product Specification Limits 

Rockwell C scale hardness is used to test the safety of 
airplane landing gear, the strength of fasteners, the 
safety of gas cylinders under accidental impact, and the 
performance of rotary lawn mower blades. These 
product characteristics and hardness are related in the 
case of steel because both are affected by heat treat- 
ment. Excess heat treatment leads to landing gear that 
snap, gas cylinders that shatter, and mower blades that 
fracture. Insufficient heat treatment leads to landing 
gear that bends, gas cylinders that deform under stress, 
and mower blades that become dull rapidly. 

Clearly, some product characteristics measured by 
Rockwell C scale hardness are critical to quality. Manu- 
facturers of products for which this is true have little 
choice but to develop and maintain appropriate capabil- 
ity in hardness measurement. An alternative test method 
is, of course, a possibility, but such development would 
likely be very expensive. Thus, treating Rockwell hard- 
ness with disdain because of its simple protocol and 
long history is not an option. 

For some product characteristics but not hardness, the 
engineering knowledge needed to determine acceptable 
values of the characteristic, that is, to set specification 
limits, is available without further experiment. For ex- 
ample, say that the characteristic is the shape of a part. 
To calculate specification limits, one could envision how 
deviations from the ideal shape would affect the use of 
the part, identify various part dimensions that together 
can be employed to assure successful usage, and finally 
specify limits on these part dimensions. Similarly, say 
that the characteristic is the composition of some mate- 
rial. To calculate specification limits, one could antici- 
pate how various constituents would affect usage and set 
specification limits on the concentrations of these con- 
stituents. These are examples of product characteristics 
for which one can set specification limits theoretically. 

The case of Rockwell C scale hardness is different. In 
fact, this difference is sometimes emphasized by saying 
that determinations of dimension and of concentration 
are measurements whereas Rockwell hardness is a test 
method. As illustrated by Rockwell hardness, test- 
method protocols are sometimes quite removed from 
product characteristics of real interest. There is no quan- 
titative model that connects Rockwell hardness and the 
performance of airplane landing gear under stress. The 
only way to establish the connection is through experi- 
ment. In the case of landing gear, this effort involves 
determination of the relation between tensile strength 
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and hardness. In fact, one can find publications that give 
this relation [7], but one must be concerned with the 
reliability of the published relation and whether it holds 
for all classes of steel. It would seem that in many cases, 
the use of a test method to assure a product characteris- 
tic requires costly experimental work and that one would 
only do this work if the characteristic were critical to 
quality and a test method were the only way to gauge the 
characteristic. 

To compensate for problems in the experimental 
work connecting product characteristic and test method, 
one might think of tightening the specification limits 
until one is sure that test method results indicate satis- 
factory product performance. This reasoning may be 
behind some customer-supplier agreements that contain 
specification limits on Rockwell C scale hardness. 
Tightening specification limits, of course, results in 
more stringent requirements on the uncertainty of the 
test method. Reducing this uncertainty may be difficult. 
In this case, one must decide what experimental work to 
invest in: One could improve one's understanding of 
the specification limits or reduce one's uncertainty in 
use of the test method. 

The idea that specification limits drive uncertainty 
requirements is complicated in the case of Rockwell C 
scale hardness because of differences among uncer- 
tainty components. The most important difference is 
between the components that describe the variation of a 
particular testing system and the components that arise 
when one is comparing systems. Characterization of the 
former entails what is commonly referred to as gage 
repeatability and reproducibility (gage R&R) [3]. Char- 
acterization of the latter involves comparison of testing 
machines and indenters. 

Rockwell C scale measurement can be used to reduce 
the variation in a heat treatment process. Although re- 
duced variation alone does not guarantee that the parts 
produced will be satisfactory, such reduction is a rea- 
sonable first step. If this is one's goal, one should use 
only a single hardness measurement system. The mea- 
surement capability needed involves only the short and 
long term variation of this system. Thus, a gage R&R 
study is sufficient to characterize the uncertainty com- 
ponents of interest and thus to determine whether the 
hardness system is capable for the task. 

More broadly, one must consider the relation of one' s 
hardness measurements to the hardness measurements 
that were part of setting the specification limits. Gener- 
ally, these two sets of measurements will involve differ- 
ent machines and indenters. Despite the differences, the 
specification limits must apply to measurements from 
both. There might be just two systems involved, the 
supplier's and the customer's. Alternatively, both sys- 
tems might be referred to the hardness measurements 



made by NIST. In either case, the uncertainty compo- 
nents that characterize the machine and indenter errors 
enter the determination of measurement capability. Of- 
ten, these errors are larger than the errors apparent in a 
gage R&R study, and determining how to reduce them 
by upgrading a measurement system is more difficult. 



3. Gage R&R 

3.1 Assessment With Nonuniform Blocks 

Repeatability, the first "R" in "R&R", is defined as 
"closeness of the agreement between the results of suc- 
cessive measurements of the same measurand carried 
out under the same conditions of measurement" [4]. 
Two stipulations in this definition, same measurand and 
same conditions, require care. Successive measure- 
ments of the same measurand cannot be realized be- 
cause a hardness measurement can be made only once 
at each test block location and test block hardness is not 
perfectly uniform. Thus, a way around block nonunifor- 
mity must be employed. In the definition, the words 
"same conditions" must be augmented by spelling out 
the conditions to be held constant. 

For the dead-weight, Rockwell-hardness machine at 
the NIST [6], we specify that "successive measure- 
ments. ..under the same conditions" means an uninter- 
rupted sequence of measurements made with the same 
indenter and machine configuration. Our use of the 
word "uninterrupted" is intended to imply that the mea- 
surements are made one after another over a short pe- 
riod of time. "Made with the same machine configura- 
tion" means made without changing the test cycle 
sketched in Fig. 1 or changing the setting of the ma- 
chine in any other way. Because the NIST machine has 
a flat anvil, we add the proviso that in the course of the 
measurements, the block not be removed from the anvil. 
For testing machines of other designs, other provisos 
might be suitable. Subsequently in this paper, we fre- 
quently stipulate that measurement conditions be held 
constant as in repeatability assessment. 

We define repeatability in terms of measurements on 
a perfectly uniform test block and then show how re- 
peatability can be estimated with nonuniform test 
blocks. The key is prescription of measurement loca- 
tions on the test blocks. The prescribed locations must 
be far enough apart to avoid the hardening caused by the 
residual stress left by an indentation and must be sym- 
metrical and close enough that deviations due to block 
nonuniformity cancel. Most of the methods presented in 
this section are based on measurement patterns com- 
posed of hexagons with 6 mm between adjacent ver- 
tices. The distance 6 mm seems large enough to avoid 
the crowding effect and small enough to allow suffi- 
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cient indents on a block. Measuring at the vertices and 
center of a 6 mm hexagon requires that one develop 
one's experimental technique. This can be done with 
some effort. 

That block nonuniformity cannot be ignored when 
the machine has good repeatability is illustrated by the 
measurements that are presented in Fig. 2 as a his- 
togram. NIST obtained these 83 values by measuring 
one of the SRM blocks, 95N63001, on a square 5 mm 
grid. The measurement spread shown in this histogram 
is caused by both lack of repeatability and block nonuni- 
formity. These two sources can be distinguished if the 
location of the measurements is taken into account. 
Block nonuniformity is largely characterized by smooth 
variation with location because the major causes of this 
nonuniformity, variation in the material and the heat 
treatment applied to it, vary smoothly. Thus, the hard- 
ness contour plot in Fig. 3 indicates how much the block 
nonuniformity contributes to the measurement spread 
shown in Fig. 2. Figure 3 shows that this block is harder 
near its edges and softer in the middle. Moreover, the 
variation shown by the contours largely explains the 
spread in Fig. 2. Not only do Figs. 2 and 3 illustrate the 
possible impact of block nonuniformity on uncertainty 
assessment, they also illustrate that the basis for dealing 
with block nonuniformity is the smooth variation of 
hardness with location. 



3.2 Trend Elimination 

As the basis for the methods in this section, we as- 
sume that hardness variation over a 6 mm hexagon re- 
sembles variation with a constant gradient. (See Sec 3.8 
for further discussion of this approximation.) Figure 4 
shows a 6 mm hexagon with numbered vertices that is 
embedded in an equilateral triangular grid covering a 
block 52 mm in diameter. Say that we measure the cen- 
ter point, number 7, and a cross-hexagon pair, perhaps 
vertices 1 and 4, holding measurement conditions con- 
stant as in repeatability assessment. If the hardness has 
a constant gradient, then, except for the variation due to 
lack of repeatability, the center measurement equals the 
average of the other two measurements. By examining 
the deviation from equality, we can assess the re- 
peatability without concern for block nonuniformity. In 
addition, arrangement of measurements on a 6 mm 
hexagon can be used to eliminate block nonuniformity 
in system comparisons such as comparison of indenters. 
If we choose conditions for the center measurement that 
differ from those for the other two, then we can deter- 
mine the effect of this difference hindered only by the 
lack of repeatability, not by block nonuniformity. 

The foregoing can be expressed mathematically. As 
shown in Fig. 4, let the locations on the outside of the 
hexagon be numbered clockwise from 1 to 6, and let the 
center be numbered 7. The 3 cross-hexagon pairs are 
vertices 1 and 4, vertices 2 and 5, and vertices 3 and 6, 
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Fig. 2. Histogram of measurements taken on block 95N63001. 
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Fig. 3. Contour plot for block 95N63001 obtained with commercial software. 



respectively. The x-y coordinates of the locations are 
approximately given in millimeters by (-6,0), (-3,5), 
(3,5), (6,0), (3,-5), (-3,-5), and (0,0). Let the seven 
hardness measurements be denoted //,, /= 1,...,7. Let 
each cross-hexagon pair be measured under constant 
conditions as in repeatability assessment. The compari- 
sons are among (//, -i- //,+3)/2, i= 1,...,3 and Hj. If the 
block nonuniformity has a constant gradient, then the 
actual hardness as a function of x-y coordinates on the 
surface is given by bo + bi x, + bj yt for some coeffi- 
cients ba, bi, and b2. We see that nonuniformity with this 
dependence on location does not enter the comparisons. 
It is useful to express the hardness measurements in 
a way that makes explicit the effect of the lack of re- 
peatability. Consider the general case of system com- 
parisons. In this case, each cross-hexagon pair is mea- 
sured under the same conditions, but different pairs and 



the center point may be measured under different condi- 
tions. We denote the system conditions for vertices 1 and 
4 by A, the conditions for 2 and 5 by B, the conditions 
for 3 and 6 by C, and the conditions for the center point 
by D. We have 

(Hi + H^)I2 = I3a + (fii + fi4)/2 

(H2 + H,)I2 = ;8b -H (fi2 + s,)l2 

(H, + He)/2 = l3c + (s, + e,)l2 

Ht = I3d + fiv 

The Bi, i = 1,...,7, denote the effects of the lack of re- 
peatability. We model these seven effects as statistically 
independent with zero mean and standard deviation a. 
We can think of ;Sa, j8b, ;8c, /3d as the hardness values 
one would obtain at the center point of the hexagon 
under the four sets of conditions and in the absence of 
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Fig. 4. A 6 mm equilateral triangular grid for a block with a hexago- 
nal measurement pattern highlighted and numbered. 

any effect of lack of repeatability. If the sets of condi- 
tions were the same, these values would be the same. 

If we knew the value of cr, then we could obtain a 
confidence interval for jSa - ;Sb, say. The standard devia- 
tion of two-location averages is alwl. Recall that the 
variance (the square of the standard deviation) of a sum 
of independent errors is given by the sum of the vari- 
ances of the individual errors. A 95 % confidence inter- 
val for /3a - j8b is given by 

(//i -I- H^ill - (Hz + Hs)/2 ± 1.96 a. 

The formula that applies when the center value is in- 
volved is somewhat different. A 95 % confidence inter- 
val for /3a - )8d, say, is given by 



(Hi + H,)I2 -Ht ± 1.96 V3/2 a. 

Other comparisons are analogous to one or the other of 
these formulas. Note that whether or not these compari- 
sons are satisfactory for the purpose depends on the size 
of the standard deviation cr. If the standard deviation is 
too large, more precise comparisons can be made by 
improving the repeatability of the machine or by repeat- 
ing the measurements on hexagons laid out elsewhere 
on the block. 



assessment. As explained in the previous section, if the 
block nonuniformity has constant gradient, only the lack 
of repeatability causes {Hi + Ha)I2, (H2 + H^)I2, 
{H3 + H(,)I2, and H^ to differ. Moreover, only lack of 
repeatability causes Hi + Hj, + Hs- H2- H4- H(, to dif- 
fer from 0. Thus, one can estimate a, that is, assess the 
repeatability, from these quantities. 

Usually, one estimates the variance cr^ and denotes 
the estimate by s ^. Let the mean of the seven readings 
be 



^ = ii H. 



The variance estimate is given by 
, 1 



[HT-Hf + ^ 2[(H, + H,,)l2-Hf 



+ [Hi +H, + H,-H2-H,- Hef/6 



Because this estimate has only 4 degrees of freedom, its 
variability must be taken into account when it is used to 
draw conclusions. If the corresponding standard devia- 
tion s were to be used instead of a to form confidence 
intervals, then the appropriate value from the Student's 
f -table, 2.776, would replace the 1.96 shown above. 

For purposes such as reporting the lack of repeatabil- 
ity of a testing machine, one would want an estimate of 
cr with more degrees of freedom. One way to do this is 
to measure more than one hexagon with the same inden- 
ter, say m hexagons. Let the hardness measurement for 
location / on hexagon j be given by //,, . Proceeding as 
above for each hexagon, we obtain an estimate of cr for 
each hexagon, which we denote by Sj . The overall esti- 
mate of the standard deviation is given by 



4-2 4^.0 
4ot , = 1 



This estimate has 4m degrees of freedom. More gener- 
ally, if the estimate Sj had Vj degrees of freedom, then the 
overall estimate would have 



E-. 



3.3 Assessing Repeatability 

Consider measurement of all seven points of a 6 mm 
hexagon under constant conditions as in repeatability 



degrees of freedom and would be given by 
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In many cases, the 12 degrees of freedom obtained from 
3 hexagons is sufficient. 

Figure 5 shows, for each of three hardness levels, the 
measurement results on three hexagons given as devia- 
tions from Hj . The units are hardness on the Rockwell 
C scale (HRC). The center is plotted with a circle, and 
members of different cross-hexagon pairs are plotted 
with triangles, squares, and diamonds, respectively. The 
variation shown is due to block nonuniformity and lack 
of repeatability. Nonuniformity with a steep gradient 
would lead to some pairs exhibiting large deviations in 
opposite directions from the center line. There is evi- 
dence of this. Members of pairs do not lie equal dis- 
tances from the center line because of lack of repeatabil- 
ity. 

For the three HRC levels 25, 45, and 63 the standard 
deviation estimates s for the variation due to lack of 
repeatability are 0.029 HRC, 0.033 HRC, and 0.024 
HRC, respectively. Each of these estimates has 12 de- 
grees of freedom. These estimates do not provide 
definitive evidence that the repeatability varies with 
hardness level. 



3.4 Long-Term Variation 

Comparison of cross-hexagon averages with center 
measurements obtained previously is one way to moni- 
tor the long-term variation of a hardness measurement 
system. One begins by measuring the centers of enough 
hexagons to cover the period during which monitoring 
is to occur. This may require laying out hexagons on 
several blocks of the same hardness level. One measures 
the centers under the constant conditions necessary to 
avoid any variation beyond the inevitable lack of re- 
peatability. Then, over the monitoring period, one mea- 
sures cross-hexagon pairs and compares pair averages 
with the center. Note that this approach cancels both 
block nonuniformity and block-to-block variation. For 
this reason, we can combine readings from hexagons on 
different blocks in the same way we combine readings 
from hexagons on the same block. 

To describe this approach mathematically, let the 
outer points of each hexagon be indexed by / as above, 
and let j index the hexagons. Denote the center measure- 
ment for hexagon j by Hqj . The order in which the 
cross-hexagon pairs are measured during the monitor- 
ing is important. It is perhaps best to measure one pair 
on each hexagon before returning to measure a second 
pair on any hexagon. Thus, for some ordering of the 
hexagons, one measures a pair on each hexagon, then a 
second on each, and finally a third. For each monitoring 
point, one can place on a run chart the deviation of the 
across-hexagon average from the center-point measure- 
ment 
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{Hji + Hj(i + 3))/2 - Hqj_ 

Frequency of monitoring and the measurement condi- 
tions to be held constant during monitoring are both 
issues to be decided. 

For each hardness level, Fig. 6 shows a run chart with 
these deviations plotted versus time. On the day the 
center measurements were made, three hexagons for 
each level were measured fully to assess the repeatabil- 
ity. These measurements are the ones shown in Fig. 5. 
From these data, we computed the nine deviations 
shown at day 0. A monitoring deviation was obtained 
each day the machine was to be used, but as shown, the 
machine was used irregularly during the period por- 
trayed. On some days, two or even three monitoring 
deviations were obtained. All of these are shown in Fig. 
6. 

The run charts in Fig. 6 are worrisome because there 
seem to be special causes that cannot be easily identi- 
fied. Clearly, the machine was operating differently on 
the day that the center measurements were made. For 
this reason, almost all the deviations are positive. More- 
over, during the period covered by the last part of the 
chart, the deviations seem to fall and then recover. Con- 
cern about the causes of these appearances is reenforced 
by the fact that they appear to some degree at each 
hardness level, although they are most pronounced at the 



lowest level. The first place one might look for causes is 
in the mechanical operation of the testing machine. One 
would like to institute some procedure for using the 
machine that would assure that whatever the cause, its 
effect would be limited. Some such procedures, for ex- 
ample, warm up procedures, are already in place but 
may not be fully effective. In particular, the NIST ma- 
chine provides a trace of indenter depth versus time for 
each measurement. At the beginning of each day, this 
trace is used to detect the need to adjust the moving 
parts of the machine so that the forces are applied at the 
proper rates. After some thought, we have concluded 
that although Fig. 6 is worrisome, the effects are not 
pronounced enough to permit identification of the 
causes. 

3.5 Assessing Reproducibility 

Parallel to the definition of repeatability, reproduci- 
bility, the second "R" in "R&R", is defined as 
"closeness of the agreement between the results of suc- 
cessive measurements of the same measurand carried 
out under changed conditions of measurement" [4]. 
What conditions change must be stated. In the operation 
of the NIST dead-weight Rockwell-hardness machine, 
the change of primary interest is the change from day to 
day that occurs with the same indenter and without 
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Fig. 6. Long-term variation shown for each day by the difference between a cross-hexagon pair and the 
center. 



519 



Volume 105, Number 4, July-August 2000 

Journal of Research of the National Institute of Standards and Technology 



changing the test cycle sketched in Fig. 1 or resetting the 
machine parameters in any other way. Often, the 
changed conditions in discussions of reproducibility in- 
volve changes in operator although this is less important 
with computer-controlled machines. There are also 
other possibilities for defining changed conditions. 

When one thinks of reproducibility, one usually 
thinks of sources of error in addition to the ones that 
cause lack of repeatability. Thus, one decomposes a 
hardness measurement H into three terms: e, the term 
that reflects the error sources that cause lack of re- 
peatability; S, the term that reflects the additional error 
sources associated with the lack of reproducibility; and 
jx, the actual hardness biased by the machine and inden- 
ter error. We have 

H= iJu+ 8+ s . 

In thinking about a measurement system over time, one 
has to note when each term in this equation changes. In 
the case of our model of the NIST machine, the term e 
changes with every new measurement; the term 8 
changes with each new day; and the term jj. changes 
only when the machine or indenter is changed intention- 
ally. 

The effect of the additional sources of error is to add 
a term to the cross-hexagon pair (//,, + //,(, + 3))/2, which 
we denote S,, , and to add a term to the center measure- 
ment Hoj , which we denote So. Say that we use just one 
cross-hexagon pair for each monitoring point, which we 
assume involves changes in the additional error sources. 
Denoting the deviations by Dy, , we have 

Dji=(Hji+Hj(i+3))/2-Hoj = 8ji - 8o + (Sj!+ ey(i+3))/2-eoy . 

On the right side of this equation, the error terms are 
statistically independent if their subscripts differ. We see 
that the run chart has a constant offset because So never 
changes. Moreover, there is some dependence between 
Dji and Dj^ because Saj only changes withj. 

Typically, reproducibility is assessed through the use 
of a standard deviation estimate. This implies that the 
changes incorporated in the definition of reproducibil- 
ity must have effects that are reasonably portrayed as 
random with constant standard deviation. Think of what 
a run chart fashioned as those discussed above would 
look like if this were true. Say that the conditions that 
change in accordance with the reproducibility definition 
change with every new point on the chart. Then, with 
one minor departure, the deviations portrayed by the run 
chart would appear to vary randomly without the 
amount of variation changing appreciably. The depar- 
ture is the dependence discussed above caused by three 
run chart points having a common center measurement. 



As an aside, note that one could form a run chart with 
changed conditions only every second point or every 
third point. Such an alternative is reasonable but re- 
quires a somewhat different reproducibility assessment. 
If the lack of reproducibility is reasonably summa- 
rized by a standard deviation, then we can add control 
limits to the run chart of the deviations. Say that we have 
n deviations available, which for generality, we take to 
have been obtained from one or more cross-hexagon 
pairs on several hexagons. The control limits should be 
centered at 



5 = ^22^.. 



where the sums cover the pairs from various hexagons 
for which deviations are available. Using 3 times the 
standard deviation, we obtain for the control limits 




These control limits account for the repeatability and the 
reproducibility of the measurement system. Points out- 
side the control limits would indicate a special cause of 
variation to be investigated. Note that the control limits 
do not have to be re-estimated even though the monitor- 
ing involves hexagons from several blocks. 

We now have a model for hardness measurements 
with two sources of uncertainty, s and 8. The variance 
of £ is denoted by cr^ and is estimated by s ^ as given in 
Sec. 3.3. The variance of S is denoted by aj. An esti- 
mate of al can be obtained from the deviations Dy, . We 
have 



4 = ;;^I.I.(P,-Df 



Note the subtraction of 3s^/2 to remove from the vari- 
ability observed on the control chart the part at- 
tributable to the lack of repeatability. The stability of 
this estimate might be seriously compromised if from 
deviation to deviation, measurement conditions do not 
change as envisioned in the definition of reproducibility. 
Note in particular that if the change envisioned is the 
change from day to day, then measuring several cross- 
hexagon pairs within a day will not do much to provide 
a better estimate of crj. 

The desirability of randomness in the sequence of 
deviations Dy, arises from the idea that randomness im- 
plies that each deviation is the effect of many causes 
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none of which is dominant. If there are many causes, 
then one expects the statistical properties of the devia- 
tions, their mean and standard deviation, to remain con- 
stant in the future. What is worrisome about Fig. 6 is that 
there is some indication that a few dominant causes exist 
that might lead to deviations in the future that are unlike 
anything observed in Fig. 6. 

Is a standard deviation an appropriate summary for 
Fig. 6? We computed ss using only one value of Dy, from 
days during which two or three were measured. We 
obtained for the lowest level 0.056 HRC, for the middle 
level 0.038 HRC, and for the highest level 0.036 HRC. 
This standard deviation does gauge the day-to-day varia- 
tion, the error component called lack of reproducibility, 
observed in Fig. 6. This component seems to be largely 
the effect of day-to-day variations in the way the ma- 
chine moves in executing the test cycle. What is wor- 
risome is that Ss may be too small to cover this error 
component as it might appear in future measurements. 
Another caution is that these results might be misleading 
in the interpretation of combinations of measurements 
made over successive days because Fig. 6 seems to indi- 
cate some lack of randomness, that is, some day-to-day 
dependence. 



where y is the vector of observations, X is referred to as 
the design matrix, /3 is the vector of parameters, and e 
is the vector of unknown errors. These errors are due to 
lack of repeatability in the case considered here. A 
multiple regression program requires input of 3; and X; 
it returns an estimate of /3, which we denote by /3. 

One might expect that we would take the 14 hardness 
measurements //,,; j = 1,2; i = 1,...,7; as the elements of 
y , but this leads to the need for elements in j8 that model 
the block nonuniformity across the hexagons. Instead, 
we have decided to confine the elements of /3 to /3a, j8b, 
13c, the hardness readings with indenters A, B, and C, 
respectively, that one would obtain under perfect re- 
peatability at the center of hexagon 1 , and A , the differ- 
ence in hardness between the centers of hexagons 1 and 
2. We have 



P = 



/3a 

/3c 

A 



3.6 Comparison and Assessment 

Consider now the comparison of indenters and testing 
machines. Generally, the notion that changes in inden- 
ters or machines are changes that should be incorpo- 
rated in a definition of reproducibility seems contrived 
because one usually has only a few indenters or ma- 
chines with which to experiment. Thus, a statement of 
differences would seem to be more satisfying than a 
standard deviation as a way of summarizing the varia- 
tion observed when the indenter or machine is changed. 

The comparisons we consider are set up so that only 
the lack of repeatability enters. One could do compari- 
sons as suggested in Sec. 3.2 and use the repeatability 
assessment in Sec. 3.3 to obtain confidence intervals. 
However, for efficiency, one might compare indenters or 
machines and assess the repeatability in the same exper- 
iment. Say one has 3 indenters to be compared using the 
14 locations available in 2 hexagons. Let the indenters be 
denoted A, B, and C. In each hexagon, we assign inden- 
ter A to vertices I and 4, B to 2 and 5, and C to 3 and 
6. Further, we assign A to the center of hexagon 1 and 
B to the center of hexagon 2. As above, we denote the 
hardness reading for location ; and hexagon j by //;, . 

Analysis of this experiment should be done with a 
program for multiple regression, which most readers 
will find available on their computers. A multiple re- 
gression program models the observations in terms of 
parameters and errors. In general, the model is given by 



So that we can confine /S to these 4 parameters, we must 
adopt as elements of y the combinations of hardness 
readings that we introduced in Sec. 3.2. We take as 
observations the across-hexagon averages and the center 
reading scaled so that their standard deviation is cr. We 
add {H, i+Hu+Hii-Hn- H14 - HieVVe and the cor- 
responding quantity from the other hexagon so that the 
regression routine includes them in the estimation of o". 
The data values to which the regression model is fit are 
given by 



(Hu + //15)/ V2 
(Hu + H,,)/V2 
H„ 
(Hn + Hu + //,5 - Hu - Hu - H,,)/V6 

(H21 + //24)/V2 

(H22 + H2,)/V2 
(H2, + H2e)/V2 
7/27 
(H21 + H23 + H25 - H22 - H24 - H2(,)/V6 



y = 



As a consequence of our choice of /3 and y, the design 
matrix is given by 
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x = 



V2 

V2 

V2 

10 



V2 V2 

V2 V2 

V2 V2 

10 1 





The logic of the model we have adopted can be seen by 
considering the equation y = Xfi , the model for y with 
error term removed. For example, from the sixth row, 
we have 

(H21 + H24)/V2 = V2j3a + V24 . 

This shows that the first cross-hexagon average for 
hexagon 2 is given by the hardness reading obtained 
from indenter A in hexagon 1 plus the hardness differ- 
ence between hexagon 1 and hexagon 2. This illustrates 
the parameterization that we have chosen. 

To estimate the elements of j8, we need a regression 
program that runs without adding a constant term to the 
model. (Adding a constant term consists of adding a 
column with elements 1 to Z and another element to j8.) 
The proper regression program gives estimates of the 
parameters, which we denote 



P = 



[A J 



Most programs will also produce an estimate of tr, 
which in this case has 6 degrees of freedom. If not, one 
can compute the residuals, which are given as the ele- 
ments of the vector y - Xp, compute the sum of squares 
of the residuals, and divide the sum by the degrees of 
freedom to obtain an estimate of a^. We denote this 
estimate by s^. 

What we wish to investigate is the difference between 
indenters, /3a - /3b, for example. We can easily estimate 
this difference as /3a - j3b but obtaining a^ confidence 
interval is more involved. To express /3a - /3b as the dot 
product of vectors, let 



Wab^ 



so that ;Sa - /3b = «ab /3- As shown in texts on regression 
[8], an estimate of the covariance matrix of the parame- 
ter estimates is given by 

s\X''X)-\ 

Better regression programs give this matrix. Using this 
matrix, we obtain an estimate of the variance of the 
difference ;Sa-/3b given by s^al^ (X'^Xy^ aAB- Simi- 
larly, the variances of other differences can be obtained. 
Taking into account the 6 degrees of freedom in the 
variance estimate, we obtain the 95 percent confidence 
interval 



;8a-/3b 



2.447 sVals (X^Xy 



CIab ■ 



As an example, consider comparison of the NIST 
primary indenter with four other indenters. To accom- 
modate this number of indenters, we implemented the 
above design twice on two sets of two hexagons. For 
both sets, the NIST indenter was assigned as the A 
indenter. Then, in the first set, indenters 1 and 2 were 
assigned to B and C, respectively, and in the second set, 
indenters 3 and 4 were assigned to B and C, respectively. 
The measurements are shown in Fig. 7 as deviations 
from the average of the measurements with the NIST 
indenter. Dominating Fig. 7 are the differences between 
the NIST indenter and the other four. We see, for exam- 
ple, that indenter 1 gives higher hardness readings at the 
25 and 45 levels, and that indenter 2 gives lower readings 
at the 45 and 63 levels. 

Figure 7 is affected by both block nonuniformity and 
lack of repeatability. The averages of cross hexagon 
pairs are not affected by block nonuniformity. The two 
hexagons in a set are nearly replicates. Thus, the results 
for all three indenters should line up. They do not be- 
cause we centered them on the average of the measure- 
ments for the NIST indenter. The effect of the lack of 
repeatability on this centering can be taken into account 
by sliding the entire column of symbols up and down to 
achieve a better match. This is effective in some cases 
such as the rightmost two columns. Obtaining the best 
estimates of differences and an estimate of the standard 
deviation requires the regression methodology described 
above. We applied this methodology to each set of two 
hexagons and pooled the standard deviation estimates 
from the two sets. The results of this analysis in terms 
of differences from the NIST indenter are given in Table 
1. The uncertainties are standard deviation estimates 
each with 12 degrees of freedom. One can use these to 
obtain confidence intervals. 
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Table 1. Differences from NIST indenter of indenters 1 to 4 each 
given with 1 standard uncertainty and the estimated standard devia- 
tion of the lack of repeatability 



Table 2. Design for comparison of indenters A to G 



Hexagon 



1, 4 pair 



2, 5 pair 



3, 6 pair 





HRC25 


HRC45 


HRC63 


1 


A 


B 


D 










2 
3 


B 
C 


C 
D 


E 
F 


Indenter 1 


0.742±0.021 


0.532±0.016 


-0.008±0.018 


Indenter 2 


0.1 11 ±0.022 


-0.182±0.017 


-0.274±0.019 


4 


D 


E 


G 


Indenter 3 


0.357±0.021 


0.161±0.016 


-0.113±0.018 


5 


E 


F 


A 


Indenter 4 


0.338±0.022 


0.399±0.017 


0.156±0.019 


6 


F 


G 


B 


Std. Dev. .« 


0.033 


0.025 


0.028 


7 


G 


A 


C 



3.7 More Elaborate Comparisons 

Someone who is familiar with the matrix notation for 
multiple regression can generalize the foregoing to ex- 
perimental designs involving different sets of compari- 
sons or different numbers of hexagons. As an example, 
consider the following rigorous comparison of seven 
indenters labeled A through G. Generally, seven 
hexagons can be laid out on a block. One could assign 
the indenters to the across-hexagon pairs as shown in 
Table 2. One would also want to assign the center points 
of the hexagons. This plan, without the center points, is 
called a balanced incomplete block plan in the experi- 
mental design literature. In that literature, a block is a 
hexagon, not a test block. Analysis of data from such a 
design can be done through generalization of the multi- 
ple regression approach discussed in Sec. 3.6. 



3.8 Reflections on Methodology 

Constant hardness gradients across 6 mm hexagons 
only approximate block nonuniformity, although the ap- 
proximation is useful as the methods in this section 
show. Referring to Fig. 3, one sees that hardness as a 
function of location is, at least in part, smoothly varying 
but this variation is only approximately planar across 
6 mm hexagons. Thus, to some extent, block nonunifor- 
mity affects the results obtained with the methods in 
this section. The methods in Sec. 4 are based on a more 
realistic model. In this subsection, we consider the con- 
stant-gradient approximation in terms of the more realis- 
tic model used in the next section. 

The model for block nonuniformity presented in Sec. 
4 pictures the hardness variation as composed of two 
components, a smooth but curvilinear function and an 
irregularly varying function that appears to have no 
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spatial continuity. The smooth component is reduced 
but not eliminated through the trend elimination meth- 
ods presented in this section. The irregular component 
cannot be clearly distinguished from the lack of re- 
peatability of the testing machine. Because a hardness 
measurement interferes with a subsequent measurement 
that is too close, one cannot tell whether what appears 
to be lack of repeatability is in part small-scale variation 
in the test block that does not appear smooth at permis- 
sible distances between measurements. In the use of the 
methods in this section, one can proceed as though what 
is observed is all due to lack of repeatability but realize 
that the resulting assessment of this uncertainty compo- 
nent may be too large. The model of block nonunifor- 
mity in the next section is actually a combined model of 
block nonuniformity and lack of repeatability. 

In Sec. 4, we model block nonuniformity probabilisti- 
cally. Such a model is supported by contour plots such 
as Fig. 3 obtained for other blocks. Comparing these 
contour plots shows that the hardness variation is largely 
smooth but shows little similarity from block to block. 
Thus, we treat the block nonuniformity as a largely 
smooth random function and estimate the covariance 
properties of this random function. In fact, the certifi- 
cates that accompany NIST's test blocks present such 
estimates [5]. If one had such estimates for one's blocks, 
then one could estimate the part of the block nonunifor- 
mity that remains after trend elimination. Generally, of 
course, covariance properties are not available for the 
test blocks one is using and therefore, one would have to 
estimate them. 

There are cases in which the irregular component in 
block nonuniformity might be distinguished from other 
error sources that influence the repeatability. Pre- 
sumably, the irregular component has a constant vari- 
ance for all blocks in a single manufacturing batch. 
Consider first the case of two machines. If, for a given 
batch of blocks, one machine has better repeatability 
than the other, then one can conclude that the extra 
variation observed in the poorer machine is due to error 
sources other than the irregular component of the block 
nonuniformity. Consider second the case of two batches 
of blocks. It seems possible that the same machine could 
exhibit different repeatability on two different batches 
of blocks. One could attribute such a difference to dif- 
ferences between the irregular components of the 
batches. One might ask whether different types of steel 
were used for the different batches, for example. 

The methods in this section offer a solution to a prob- 
lem in the use of several blocks in the same experiment. 
The problem arises when the measurement locations on 
a block are treated as randomly chosen. Even if such 
choice is properly made through designation of a set of 
locations on a block and random selection among these 



locations, the fact that the variance varies from block to 
block remains. The problem is that the variance induced 
by random selection depends on the nonuniformity of 
the block. The methods in this section remove the gross 
features of block nonuniformity and thus allow one to 
assume that whatever residual there is, it can be re- 
garded as similarly distributed for every block in a man- 
ufacturing batch. Thus, the methods in this section allow 
pooling of variance estimates. Such pooling is also part 
of the estimation NIST uses in its block certification. 
Such pooling is especially valuable in the use of several 
blocks to apply control chart methodology over a long 
period. 



4. Comparison with NIST 

We now consider experimental methods based on the 
NIST SRMs. As shown in Sec. 3, there is much one can 
learn about a measurement system using any good-qual- 
ity test blocks. More can be learned from NIST SRMs. 
In particular, judging the difference between what one's 
system does and ideal execution of the Rockwell C Scale 
method is best done with NIST SRMs. 

Test blocks, including NIST's, are nonuniform. Think 
of the test surface of a NIST block as a collection of 
measurement locations. The metal at each of these loca- 
tions will have a hardness value that differs slightly from 
the hardness values at other locations. If it were possible 
to determine the hardness at every location, then we 
would have hardness as a function of location. The func- 
tion we would see would be largely smooth although 
perhaps also with a rapidly varying component. We ex- 
pect the smooth part to be dominant because block 
nonuniformity is largely due to the nature of the steel 
and the test block manufacturing process, factors that 
vary smoothly across the block. NIST measured hard- 
ness as a function of location on several test blocks and 
found this to be true. Moreover, we found that these 
blocks have different hardness functions that are how- 
ever, similar in their smoothness. Thus, a model that 
portrays each block as having a different but similarly 
smooth hardness function seems reasonable. 

Generally, the user of a NIST SRM asks for the differ- 
ence in hardness readings between the user's own equip- 
ment and ideal equipment. The user might also ask for 
the difference between the user's equipment and NIST's. 
The day that NIST measures a user's SRM block at the 
seven locations, it could also make measurements at 
many more locations. The user can ask what NIST 
would have obtained at other locations, in particular, at 
the locations of the user's measurements. We now con- 
sider this question and answer it by providing a predic- 
tion of what NIST would have observed the same day as 
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well as an uncertainty for this prediction. Note that this 
is not the same as a prediction of what NIST would have 
observed on a different day, which involves the NIST 
reproducibility. Also, note that in a comparison with 
user measurements, the user repeatability and repro- 
ducibility must be considered. We return to these uncer- 
tainty components in the next section. Because NIST 
could have continued to make measurements on an SRM 
block after it had made the initial seven, we can think in 
terms of a function of location s, H{s), the value NIST 
would have observed at s the day it made the initial seven 
measurements. 

The method we use to predict hardness values at 
untested locations is based on a geostatistics formula 
that models the hardness across the surface of a block as 
a random function described by a semivariogram [9]. 
The semivariogram is a mathematical model that de- 
scribes how the measured hardness difference between 
any two test block locations relates to the physical spac- 
ing that separates them. In statistical terms, this semivar- 
iogram gives one half the variance of the hardness dif- 
ference between any two locations on the test block. 
Thus, the square root of twice the semivariogram is the 
standard deviation of this difference. 

We can obtain a semivariogram estimate from the 
measurements made on block 95N63001, which we 
have already depicted in Figs. 2 and 3. If we let / index 
these measurements, then for each value of / we have a 
hardness value //, and a location given by the coordinate 
values Xi and y, . The semivariogram characterizes differ- 
ences between these measurements as a function of the 
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distance between them 

d,j = \/{x.-Xjf + (yi-yjf . 

The semivariogram for a particular distance is estimated 
from the hardness differences for points that distance 
apart. Let y(d) denote the estimated semivariogram for 
distance d. We have 

2yid) = -J,iH,-Hjf, 

where the sum is over all the pairs of measurements (/, 
j) for which d = dy and n^ is the number of pairs in the 
sum. We see that the estimated semivariogram is one 
half the average of hardness differences squared and 
thus a variance estimate. 

A semivariogram characterizes the smoothness of the 
hardness measurements across a surface through the 
way it decreases with decreasing distance. The semivar- 
iogram estimate obtained from block 95N63001 is 
shown in Fig. 8. The important part of this semivari- 
ogram is the part for distances less than 26 mm, half the 
diameter of the block. The values for distances near 
50 mm are estimated from the few pairs of points that 
span the block. That these values tend toward zero is the 
result of the concave hardness surface shown in Fig. 3, 
which is peculiar to this block. For distances less than 
26 mm, the semivariogram decreases with decreasing 
distance as expected from the smoothness of the block 
nonuniformity. Note that the smallest distance is 5 mm. 
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Fig. 8. Empirical semivariogram for block 95N63001 (one half the mean square of differences). 
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the smallest distance possible with a 5 mm grid. We 
would like to know the value that the semivariogram 
approaches as the distance approaches zero because this 
value characterizes the irregular component in the block 
nonuniformity and the lack of repeatability. One can 
estimate this value by extrapolating the values in Fig. 8 
to zero. On this basis, Fig. 8 suggests that the repeatabil- 
ity of the NIST machine is very good. The value ob- 
tained by extrapolating to zero is, after taking the square 
root to convert it to a standard deviation, comparable to 
the repeatability results given in Sec. 3.3. 

We estimated a semivariogram function for each SRM 
hardness range and use the result in our prediction of 
hardness values at unmeasured locations. This estima- 
tion is based on several blocks, not just one. An appro- 
priate estimation algorithm is given by Curriero and 
Lele [10]. The algorithm actually used for the first batch 
of NIST Rockwell C scale SRMs is somewhat different 
but the resulting estimates are nearly the same. We fit an 
exponential semivariogram model to the data [9]. This 
model is given by 



yid) = 



ifd = 

Co + Ceil - exp(- d/Of )) if d> 0' 



The estimates are shown in Fig. 9. Note that these esti- 
mated functions provide an extrapolation to zero dis- 
tance. Other extrapolations are possible. The way the 
zero value varies with hardness range raises an interest- 
ing question. Are there sources of error in NIST's testing 
machine that are more severe for softer blocks or is there 
some variation in the blocks at distances much smaller 
than 5 mm that is more severe for softer blocks? Our 
data does not allow us to distinguish these alternatives. 
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Consider now prediction based on the seven NIST 
measurements, which are located at the vertices and at 
the center of a 20 mm hexagon. Let these initial loca- 
tions be 501, •■•, •Son,,, where Mq = 7. The values H{soi), ..., 
H{sonJ are provided on the SRM certificate. Consider 
another n points, Si, ..., s„. Of course, these n + 7 loca- 
tions are subject to the minimum spacing requirements. 
We wish to predict 



ffpred - ~ Zj H{Sk). 



This formulation includes prediction for a single point 
(n = 1). We consider the more general case because 
users often make groups of measurements on test 
blocks. We compute not only a single-value prediction 
for this quantity, //pred, but also the prediction variance, 
(Tprcd. This variance corresponds to the first source of 
uncertainty listed on the NIST certificate. Put together, 
we can obtain prediction intervals, for example, a 95 % 
interval (Hj,rcd - IMa^.^a, Hp„d -i- 1.96crp,ed). 

The prediction //prcd, which is a linear combination of 
the measurements on the certificate, is given by 



^prcd = Zj ^i H(Soi), 

1=1 

where 

"0 

Ea, = i. 

;=1 

The predictions are based on the semivariograms given 
on the SRM certificates. For the first SRM batch, they 
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Fig. 9. Semivariogram functions applicable to the first issue of the NIST SRMs. 
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are shown in Fig. 9. These semivariograms follow the 
exponential model given above and are functions of dis- 
tance on the surface of the block. For convenience, we 
change our notation for the semivariogram. For two 
locations, s,, and St, separated by V(x„ -Xi,)^ + (ya - 
y^y, we denote the value of the semivariogram by 
yiSa -St) instead of by y(V(x<, -Xhf + (y,, -y,,f). 

Computation of the coefficients and the prediction 
variance consists of four steps. The first step is inversion 
of the «o X «o matrix Fwith i,j element y(soi - Saj). Let 
the elements of the inverse /" ' be denoted gij. This 
inverse matrix depends only on the semivariogram for 
the points measured by NIST and therefore can be com- 
puted once for each hardness level. The second step is 
computation of 

The third step is computation of three quadratic forms 

"0 "0 
"0 "0 

Gi2 = 2 2 gijyj 

,= 1 y=l 
"(I "o 

Gil = 2 ^ gij- 



The final step is computation of the coefficients that 
multiply the NIST measurements //(so,) to form the 
prediction Hpr^d and the prediction variance 

A; = Z giJ li + —p. Z gii 

yii 



j=i 



j=i 



■^pred 






Put together, these two quantities give a prediction inter- 
val for the average value NIST would have obtained on 
the day the SRM measurements were obtained. For ex- 
ample, a 95 % interval is given by 



flp,ed ± 1.96CT-p,,d = 2A,'f^(so,) ± l-96a-, 



prcd- 



An example of prediction for n = 1 is shown in Fig. 
10. This figure shows the predicted hardness //prcd for 
each location on a block, actually a low-range block, 
based on the seven measurements made on this block. 
We see that in terms of gross features, the nonunifor- 
mity of this block differs from the nonuniformity of 
block 95N63001 shown in Fig. 3. 

The foregoing computational details provide little in- 
sight into the prediction itself. As a consequence of the 
semivariogram estimates and the locations of the NIST 
measurements on a 20 mm hexagon, the values of A, that 
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Fig. 10. Hardness values of unmeasured locations on a NIST SRM block predicted from the seven measurements NIST made. 
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result from the above computation are positive (or at 
least close to being positive). This implies that the pre- 
dicted value must lie between the smallest and the 
largest of the NIST measurements. In the case m = 1, 
//pred can be regarded as an interpolated value. In fact, if 
only two points were measured (wo = 2) and the predic- 
tion were for the point half way between them, then the 
prediction would be the average of the two measure- 
ments. 

For the purpose of correcting for machine and inden- 
ter error as discussed in Sec. 5 and for other metrologi- 
cal purposes, it is best to use //prcd- Current practice, 
however, is to provide a single hardness value for a test 
block. On its certificates, NIST also provides a single 
value for a block. This certified average hardness value 
is an estimate of the hardness function integrated over 
the test block surface divided by the surface area. It is 
analogous to the certified value assigned to commer- 
cially produced hardness test blocks, which is usually 
calculated as the arithmetical average of the measure- 
ments. In the case of the NIST block, the certified aver- 
age hardness value is the average of the predicted hard- 
ness values for all test surface locations, and not the 
arithmetical average of the seven NIST measurements. 
Because the locations chosen for the seven NIST mea- 
surements provide a good representation of the range in 
surface hardness, the two averages are nearly identical 
in value. 

Some blocks such as the one portrayed in Fig. 3, we 
measured more than just the seven times that is charac- 
teristic of the SRM blocks NIST offers. For these 
blocks, we can choose rta measurements, use these Hq 
measurements to predict the other measurements, and 
compare the prediction Hprod with the actualization //prcd. 
The question of how well the prediction performs has 
two aspects. One is whether the prediction interval con- 
tains the actualization with the frequency implied by the 
percent confidence chosen. The other is whether sub- 
tracting the prediction from the actualization reduces 
the variation by a substantial amount. There are some 
details to be considered in this performance investiga- 
tion. We used two sets of locations in filling blocks with 
measurements. This results in what we refer to as filled 
blocks and partially filled blocks. Unfortunately, neither 
set of locations contains the SRM locations. For this 
reason, we chose other locations as the basis for the 
prediction. 

We consider the first aspect using a filled block for 
each hardness range. Because the locations actually 
measured on the SRM blocks were not measured on the 
filled blocks, we use as a basis for prediction the mea- 
surements at the points (0,0), (-20,0), (-10,15), (10,15), 
(20,0), (10,-15) and (-10,-15). For each location not in 
this set, we computed the predicted value, subtracted it 



from the actualization, and divided hy the prediction 
standard deviation to obtain (//pred - flpred)/<7'pred, where 
n = I. Note that the actualization is the value we set out 
to predict, which is the value NIST obtained. We call 
these values standardized residuals and show them in 
Fig. 11. 

The values shown in this figure should ideally lie 
outside (-1.96, 1.96) only one time in twenty. Since we 
have plotted the values versus the distance of the location 
from the center of the block, we can see that the statisti- 
cal model on which the prediction interval is based does 
not hold exactly. We see an edge effect that is not sur- 
prising when one thinks of how test blocks are manufac- 
tured. On these blocks, the hardness near the edge varies 
more rapidly with distance than is portrayed by the 
semivariogram. This is particularly true of the HRC 63 
block. The contour for this block is shown in Fig. 3. It is 
not clear what can be done about this edge effect. First, 
not all blocks have edge effects, and in fact, the contours 
for different blocks show little resemblance. Thus, use of 
a model of the block nonuniformity that takes into ac- 
count the edges does not seem worthwhile. Second, in 
finding the difference between the user's equipment and 
NIST's, the user will make more than one measurement, 
and this will relieve the problem unless the user makes 
all the measurements near the edge. We conclude that if 
the user is somewhat cautious about measurements near 
the edge of the block, the prediction approach presented 
here is serviceable. 

We investigate the second aspect of prediction perfor- 
mance using three partially filled blocks. We compare 
the variation of the actualization with the variation of the 
difference between the actualization and the prediction. 
Because of the locations measured on the partially filled 
blocks, we cannot predict on the basis of a hexagon 
pattern of points. We used instead the six points (-22, 
0), (-5, 0), (5, 0), (23, 0), (0, -15), and (0, 15), a cross 
pattern. We predicted the other points on the block for 
which we already had hardness measurements, the actu- 
alizations. Let S\ be the centered sum of squares of the 
actualizations Hprod for the locations predicted, and let ^2 
be the centered sum of squares of the corresponding 



residual values H, 



prcd 



H 



prcd. 



The difference ^i - ^2 
shows how well the prediction corresponds to the actual- 
ization. We compute 

R' = (S,-S2)/S,. 

For the low range, mid range, and high range, we 
obtain for R^ the values 0.69, 0.61, and 0.39. In the 
interpretation of these values, note first that R^ behaves 
like the similar quantity in regression analysis; the value 
1 corresponds to perfect prediction. Moreover, the value 
of R^ will generally be higher for blocks that are more 
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Fig. 11. Measured value minus predicted value divided by standard deviation of the predicted 
value for a block of each hardness level. 



nonuniform. The values we obtained from the blocks 
considered are evidence of how large R^ might be 
amidst the first batch of NIST SRMs. The values for 
more uniform blocks would be smaller. What the effect 
of substituting the hexagon pattern for the cross pattern 
would be, we do not know. Nevertheless, we see that for 
the more nonuniform blocks in a batch, prediction re- 
duces the variation substantially. 



5. Measurement Correction 
5.1 User-NIST Difference 

People who make hardness measurements should ex- 
pect their results to differ from results NIST would have 
obtained with its indenter and testing machine and, for 
this reason, should entertain correction of their measure- 
ments so that the agreement is better. The procedures in 
this section are a guide to making such corrections and, 
in addition, a guide to deciding whether such correc- 
tions are satisfactory. If they are unsatisfactory, equip- 
ment upgrades may be the only option. 

Users of NIST's Rockwell C scale SRMs can observe 
the difference between their measurements and NIST's 
at the three hardness levels for which SRMs are offered. 
More precisely, as discussed in Sec. 4, users can observe 
the difference between the average of their measure- 
ments on an SRM block and a prediction of what NIST 



would have obtained for the same locations. This section 
shows how a correction of a user measurement at any 
hardness level can be obtained from the differences at 
the three available hardness levels. Correction, however, 
may not provide sufficiently small measurement uncer- 
tainty. In this section, we discuss measurement correc- 
tion including all the uncertainty components that a 
corrected measurement entails. After computing the to- 
tal uncertainty from the components, the user can de- 
cide whether measurement correction is sufficient or 
whether an equipment upgrade is needed. 

In Sec. 3.5, we expressed a hardness measurement as 
the sum of three terms, H = jj.+ d+ e, where jj. involves 
the actual hardness as well as the machine and indenter 
errors. In the case of the user machine and indenter, we 
use the symbol /j, . In the case of the NIST machine and 
indenter, we use the symbol /j-nist. Ideally, one would 
like to correct /j, to the actual hardness value. However, 
because the NIST SRMs involve machine and indenter 
error, it is better to think of correcting jjl to /.(-nist- The 
correction is 

/(MnISt) — I^ ~ A'-NIST^ 

and therefore, we have 

H = yu-NisT +/(m-nist) + 8+ e. 
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On this basis, we correct measurements to the scale 
defined by the NIST machine and indenter. 

The correction is determined from user measure- 
ments on a NIST block for each hardness level. The first 
question is choice of measurement locations. Actually, 
one can make measurements on a NIST SRM wherever 
one wants (subject to the minimum spacing require- 
ments) and use the equations in Sec. 4 to obtain the 
NIST certified value (with uncertainty) for the average 
of these locations. Nevertheless, some choices of loca- 
tions are better than others. Although we have not stud- 
ied this issue, a reasonable possibility seems to be an 
extension of the 6 mm hexagons discussed above. One 
can lay out 6 mm hexagons around each of the NIST 
measurements and select one or more cross-hexagon 
pairs as one's measurement locations. In comparing an 
across-hexagon average with the center value, one 
should not assume that the block nonuniformity cancels 
as we do in Sec. 3. The prediction uncertainty produced 
by the equations in Sec. 4 takes into account the fact that 
the nonuniformity does not cancel exactly. Thus, even 
though one chooses locations based on 6 mm hexagons, 
one should use Sec. 4 to obtain the NIST prediction 
interval. 

Say that one has made n measurements on a NIST 
SRM and that the average of these measurements is H . 
Moreover, say that one has followed Sec. 4 to obtain the 
NIST prediction for the average hardness of these loca- 
tions //picd- The difference H - //p^d would not be zero 
even if H had been obtained with the same machine and 
indenter NIST used to certify its SRMs. Using bars to 
denote averages, we obtain from the equation for H 
given above 

H = Mnist H-/(/J.nist) + 8+ s. 

The standard deviation of 8 is as, and the standard 
deviation of e is alwn. Moreover, think of //p,cd as 
composed of three components 



a■^ = Vo-g + crVn + ct^nist + o", 



^prcd — /^NIST + SnisT + Sj 



prcd. 



where /Xnist is the actual hardness reading corrupted by 
the NIST machine and indenter error, Snist reflects 
NIST's lack of reproducibility, and fip^d is the prediction 
error discussed in Sec. 4. The standard deviation of Snist, 
which we denote cr5_NisT, is given on the NIST certifi- 
cate. We have 



pred' 



Estimation of a^ is discussed in Sec. 3.3; estimation of 
crj is discussed in Sec. 3.5; the value of ct^nist can be 
obtained from the certificate; and computation of crp^ed is 
discussed in Sec. 4. 

Comparison of one's hardness system with NIST's 
requires consideration of all three hardness levels for 
which NIST has issued SRMs. Let m index these hard- 
ness levels. We write H„ - //pred(m), Mm - /J-NisTCm) and 



(TAm. The random errors 8, s, Snist, Sprcd in the expres- 
sion for H„ - Hprcdcm) occur at each hardness level and 
thus should have the subscript m . Three of these random 
errors, s, Snist, fiprcd, are statistically independent from 
one hardness level to another. The fourth, 8, the error 
associated with the user reproducibility, may not be 
statistically independent from one hardness level to an- 
other, but we assume that it is. 

5.2 Curvature 

Two aspects of /(/anist) distinguish hardness mea- 
surement correction from calibration problems in other 
fields. First, over the range of Rockwell C scale hard- 
ness, the function /(/Xnist) is smooth and small. This will 
be true if the user's machine and indenter are reasonably 
close to the Rockwell C scale prescription. On this basis, 
we can replace /(/Xnist) with/(flprcd)- Second, the func- 
tion/exhibits some curvature. For this reason, we must 
include deviations of/(/j,NisT) from linearity as an uncer- 
tainty component. 

We approximate /(/anist) by a + {fi - 1)/j,nist- Be- 
cause the NIST blocks have hardness values near 25, 45, 
and 65 HRC, we can check this approximation by con- 
sidering the difference between the deviation for the 
middle block and the average of the deviations for the 
highest and lowest blocks. Let 



n ■■ 



HprcdjI) — flprcd(l) 
-Hprcd(3) — "prcd(l) 



and let 



_ "prcd(3) ~ -Hprcd(2) 
-f'prcd(3) — "prcd(l) 



If we ignore the error in these ratios, then this gauge of 
curvature is given by 



H — -Oprcd — /(/^NISt) + O + S — OnisT " ^prcd- 

The combined standard deviation for the last four terms 
in this equation is 



6 — r^ijL^ — /AnIST(3)) + fiiP-] — /^NISTCl)) — iP-l — P-mSTil)) 

and is estimated by 

6 = r^ifl^ — /lprcd(3)) + f\\Hl — -Oprcd(l)) — ("2 — -Hprcd(2)) 
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The standard deviation of is given by 

Using this standard deviation, one can form a confi- 
dence interval for 6 and judge how large might be in 
light of the various sources of random error. If is small 
in reference to the measurement application, then a lin- 
ear measurement correction is reasonable. Because 
NIST issues blocks at only three hardness levels, we 
consider only the possibility of a correction linear in the 
user's reading since there would be no way to check the 
fit of a curvilinear correction. 

5.3 Linear Correction 

We now consider estimating the correction and the 
uncertainty associated with this estimation. First, we 
calculate the average of the NIST values 



1 ' ^ 



the estimated slope 



i8-l=- 



/ / (H,„ — /lprcd(m))(npred(m) — "avg) 
3 
/J \-'^pred(m) "avg/ 



truth to which the uncertainty refers. In this case, there 
are four uncertainty components, the one due to user 
lack of repeatability, the one due to user lack of repro- 
ducibility, the one due to error in estimation of the linear 
correction, and one that must account for the curvature 
in the relation between user measurements and NIST's. 
In terms of the usual uncertainty parlance, the first three 
of these are Type A [4]. The standard uncertainties of 
these components are given above. The uncertainty that 
arises because the correction really needed is non-linear 
can be gauged by 0. From the confidence interval for 0, 
we can develop bounds on the uncertainty due to this 
source. Such bounds should be serviceable if not com- 
pletely defensible because f(fimsr) could take an even 
greater excursion. With these uncertainty components 
assessed, one is in a position to decide whether one's 
measurements after correction are sufficient for the pur- 
pose. 

The implications of can be further understood in 
terms of its relation to a and /3 . The estimates , a, and 
|S^ provide an approximate decomposition of H„, - 
Hpr^d(„i), i = ^x ^' ^- ^^^^ approximation is exact if 



H, 



pred(l) ■ 



= 25, //prcd(2) = 45, and //prcdp) = 65. We consider 



this special case because the NIST SRMs are approxi- 
mately at these levels. With some algebra, we see that 



0={Hx+H,)l2-H2 
& = (H,+H2 + H,)/3 ■ 
(3 = (Hi- //i)/40 



45i3 



and the estimated intercept 



The decomposition is 



a = 3 J,iH,n - 4.ed(„,)) - (/3 - l)H„ 



Denote the hardness reading to be corrected as U. The 
correction to be subtracted from U is 

C=[a + (P -l)UV^. 

Thus, the corrected reading is U - C. In terms of the 
standard deviation, the uncertainty in C is given by 



(Tc = 



3 / ^ ^ ' \2 

y M {U - -tfavg)(flprcd(m) - -Havg) | / 2 



^UH, 



A'-lpmiiim) ■ 



fj \2 



5.4 Uncertainty 



Consider finally, the uncertainty components oi U - 
C, a user measurement corrected to the NIST scale. To 
begin, one must ask about the scale to which the uncer- 
tainty refers. Since the measurement is corrected to the 
NIST scale, it is reasonable to take the NIST scale as the 



Hi-25 = a + 25(l3-l)+ 0/3 
H2-45 = & + 45(iS - 1) - 2(9/3 
H,-65 = a + 65(iS - i) + 0/3 

We see that the user values are decomposed into a linear 
function of the NIST predictions //prcd(m) and our gauge of 
curvature . Thus, accounts for the part of the user 
measurements not amenable to a linear correction. 
Moreover, if (tI„ is the sarne for all m , then the estima- 
tion error for a + //prcd(m)(/3 - 1) is independent of the 
estimation error for . Thus, the uncertainty in the 
linear correction and the uncertainty in the gauge of 
curvature when combined as the root sum of squares 
gives approximately the uncertainty in //,„ - //prcd(m)- 
These results show that because of the levels of the 
NIST SRMs, the quantities introduced in this section are 
more simply related than might appear at first. Of par- 
ticular note is the fact that applying the linear correction 
does not increase the uncertainty in user measurements. 
A completely defensible recipe for incorporating the 
curvature into the uncertainty does not seem possible. 
Our gauge of curvature, a confidence interval for 0, does 
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not tell us about the differences between user and NIST 
measurements at hardness levels between the levels of 
the NIST SRMs. These differences could be larger than 
2d 13. Moreover, these differences could be material de- 
pendent. Such dependence might be especially severe 
when d is large. To this must be added the complication 
that, as shown above, the effect of the curvature on the 
corrected user measurements is dl2> or -26 13 depending 
on the level of the NIST SRM. A consensus recipe for 
the case of d small may be possible, but if the confi- 
dence interval for 6 suggests that d may be large, the 
user of the uncertainty statement must be careful. 

Unless all parties agree to correct their measurements 
to the NIST scale, one must consider the relation of 
measurements to the ideal Rockwell C scale. One way to 
do this is to add to the above uncertainty components, 
the uncertainty components given on the NIST certifi- 
cate for machine and indenter error. This will increase 
the total uncertainty. The advantage of agreement to 
correct to the NIST scale is that the NIST machine and 
indenter errors do not have to be considered in a com- 
parison. Of course, it would be still better to improve 
machines and indenters so that errors attributable to 
these sources are reduced. 



6. Summary 

Rockwell hardness occupies a preeminent position in 
mechanical testing because of its long history and wide 
use. For this reason, Rockwell hardness is the natural 
choice for a paper on detailed understanding of test 
methods. This paper provides procedures that can be 
implemented to evaluate a hardness measurement sys- 
tem, concepts that help with the investigation of hard- 
ness measurement, and an example that can guide test 
method development. 

This paper provides procedures for repeatability and 
reproducibility assessment, for control charts, for com- 
parison of systems, for use of NIST SRM test blocks, 
and for measurement correction. These procedures give 
a fine understanding, but a reader may ask whether such 
an understanding is necessary. The answer is, of course, 
that it depends on the hardness application. 

This paper explains concepts related to sources of 
measurement uncertainty, use of nonuniform blocks, the 
combined uncertainty of measurement, and evidence of 
the need for equipment upgrades. For a reader involved 
in hardness testing, these concepts are important even if 
the procedures in this paper are not implemented. For 
example, if one is puzzling over why two hardness mea- 
surements differ, one needs these concepts in thinking 
about possible causes. 



Although it does not apply to all test methods, this 
paper does apply to many tests currently under develop- 
ment. Generally, this paper applies to tests for products 
with both upper and lower specification limits. One gen- 
eral aspect of test method development discussed in this 
paper is the need to manage two distinct areas of exper- 
imental effort, assessment of measurement uncertainty 
and determination of the relation between the test 
method and the product property critical to quality. A 
second aspect of broad interest is the use of test methods 
for local measurements of surfaces. Applicable to such 
a test method use are approaches to surface variation 
such as trend elimination and spatial statistics. A third 
aspect is the need for reference materials to assure long- 
term comparability of test method results. In general, 
this paper shows how test methods can be treated with 
more care than is usual in current practice. 
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